Beyond Multivariate Microaggregation for Large Record Anonymization
نویسنده
چکیده
Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least k elements and, therefore, preserving k-anonymity. Usually, in order to avoid information loss, when records are large, i.e., the number of attributes of the data set is large, this data set is split into smaller blocks of attributes and microaggregation is applied to each block, successively and independently. This is called multivariate microaggregation. By using this technique, the information loss after collapsing several values to the centroid of their group is reduced. Unfortunately, with multivariate microaggregation, the k-anonymity property is lost when at least two attributes of different blocks are known by the intruder, which might be the usual case. In this work, we present a new microaggregation method called one dimension microaggregation (Mic1D− k). With Mic1D− k, the problem of k-anonymity loss is mitigated by mixing all the values in the original microdata file into a single non-attributed data set using a set of simple pre-processing steps and then, microaggregating all the mixed values together. Our experiments show that, using real data, our proposal obtains lower disclosure risk than previous approaches whereas the information loss is preserved.
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملOptimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation
Microaggregation is a special clustering problem where the goal is to cluster a set of points into groups of at least k points in such a way that groups are as homogeneous as possible. Microaggregation arises in connection with anonymization of statistical databases for privacy protection (k-anonymity), where points are assimilated to database records. A usual group homogeneity criterion is wit...
متن کاملUsing genetic algorithms for attribute grouping in multivariate microaggregation
Anonymization techniques that provide k-anonymity suffer from loss of quality when the data dimensionality is high. Microaggregation techniques are not an exception. Given a set of records, attributes are grouped into non-intersecting subsets and microaggregated independently. While this improves quality by reducing the loss of information, it usually leads to the loss of the k-anonymity proper...
متن کاملUtility preserving query log anonymization via semantic microaggregation
Query logs are of great interest for scientists and companies for research, statistical and commercial purposes. However, the availability of query logs for secondary uses raises privacy issues since they allow the identification and/or revelation of sensitive information about individual users. Hence, query anonymization is crucial to avoid identity disclosure. To enable the publication of pri...
متن کامل